A data-mining approach to spacer oligonucleotide typing of Mycobacterium tuberculosis
نویسندگان
چکیده
MOTIVATION The Direct Repeat (DR) locus of Mycobacterium tuberculosis is a suitable model to study (i) molecular epidemiology and (ii) the evolutionary genetics of tuberculosis. This is achieved by a DNA analysis technique (genotyping), called sp acer oligo nucleotide typing (spoligotyping ). In this paper, we investigated data analysis methods to discover intelligible knowledge rules from spoligotyping, that has not yet been applied on such representation. This processing was achieved by applying the C4.5 induction algorithm and knowledge rules were produced. Finally, a Prototype Selection (PS) procedure was applied to eliminate noisy data. This both simplified decision rules, as well as the number of spacers to be tested to solve classification tasks. In the second part of this paper, the contribution of 25 new additional spacers and the knowledge rules inferred were studied from a machine learning point of view. From a statistical point of view, the correlations between spacers were analyzed and suggested that both negative and positive ones may be related to potential structural constraints within the DR locus that may shape its evolution directly or indirectly. RESULTS By generating knowledge rules induced from decision trees, it was shown that not only the expert knowledge may be modeled but also improved and simplified to solve automatic classification tasks on unknown patterns. A practical consequence of this study may be a simplification of the spoligotyping technique, resulting in a reduction of the experimental constraints and an increase in the number of samples processed.
منابع مشابه
Interpreting genotype cluster sizes of Mycobacterium tuberculosis isolates typed with IS6110 and spoligotyping.
Molecular techniques such as IS6110-RFLP typing and spacer oligonucleotide typing (spoligotyping) have aided in understanding the transmission patterns of Mycobacterium tuberculosis. The degree of clustering of isolates on the basis of genotypes is informative of the extent of transmission in a given geographic area. We analyzed 130 published data sets of M. tuberculosis isolates, each represen...
متن کاملMolecular Strain Typing of Mycobacterium tuberculosis: a Review of Frequently Used Methods
Tuberculosis, caused by the bacterium Mycobacterium tuberculosis, remains one of the most serious global health problems. Molecular typing of M. tuberculosis has been used for various epidemiologic purposes as well as for clinical management. Currently, many techniques are available to type M. tuberculosis. Choosing the most appropriate technique in accordance with the existing laboratory condi...
متن کاملIdentifying Mycobacterium tuberculosis complex strain families using spoligotypes.
We present a novel approach for analysis of Mycobacterium tuberculosis complex (MTC) strain genotyping data. Our work presents a first step in an ongoing project dedicated to the development of decision support tools for tuberculosis (TB) epidemiologists exploiting both genotyping and epidemiological data. We focus on spacer oligonucleotide typing (spoligotyping), a genotyping method based on a...
متن کاملSpoligotyping and whole-genome sequencing analysis of lineage 1 strains of Mycobacterium tuberculosis in Da Nang, Vietnam
BACKGROUND Spacer oligonucleotide typing (spoligotyping), a widely used, classical genotyping method for Mycobacterium tuberculosis complex (MTBC), is a PCR-based dot-blot hybridization technique to detect the genetic diversity of the direct repeat (DR) region. Of the seven major MTBC lineages in the world, lineage 1 (Indo-Oceanic) mostly corresponds to the East African-Indian (EAI) spoligotype...
متن کاملEvaluation of four DNA typing techniques in epidemiological investigations of bovine tuberculosis.
DNA fingerprinting techniques were used to type 273 isolates of Mycobacterium bovis from Australia, Canada, the Republic of Ireland, and Iran. The results of restriction fragment length polymorphism (RFLP) analysis with DNA probes from IS6110, the direct repeat (DR), and the polymorphic GC-rich sequence (PGRS) were compared with those of a new PCR-based method called spacer oligonucleotide typi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 18 2 شماره
صفحات -
تاریخ انتشار 2002